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^ ' Computing averages over a target probability density by statistical re-weighting of a set of 

samples with a different distribution is a strategy which is commonly adopted in fields as diverse 
f^ ' as atomistic simulation and finance. Here we present a very general analysis of the accuracy and 

efficiency of this approach, highlighting some of its weaknesses. We then give an example of how 
our results can be used, specifically to assess the feasibility of high-order path integral methods. 
(~| . We demonstrate that the most promising of these techniques - which is based on re-weighted 

^ i' sampling - is bound to fail as the size of the system is increased, because of the exponential 

growth of the statistical uncertainty in the re-weighted average. 

Qj ' Key words: statistical sampling, re-weighting, high-order path integrals. 
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c/3 . 1. Introduction 

>>■ 

Averaging is to the theoretical and computational sciences what measuring is to empirical 
science, and it is hard to imagine a problem with more general implications. Formally, the 
problem of computing an expectation value for a quantity a{x) over a given probability 
^ distribution p{x) can be stated as 

oo: , , 

a^ 

Very often, it turns out that x spans a very high dimensional space, which makes 
[~^ . it impossible to evaluate the integral on a grid of points. Instead, importance sampling 

^^ . algorithms [ll-Q are used which generate a set of points x^ distributed in accordance 

with the target probability p. Then, the expectation value can be computed as an average 
over that set of points, 



a{x)p{x)diX. (1.1] 



n 



(a),«a(r)=n-^Va(x(^)). (1.2) 



p n ■ " / J 

S^ ' 4=1 

fn ' 

The error in the estimate (|1.2p decreases with n"^'^, provided that the sample points 

are uncorrelated. In a number of circumstances obtaining uncorrelated samples from 

p is an exceedingly difficult problem, either because generating an individual point 

is computationally demanding, or because the sampling algorithm produces strongly 

correlated points, so that each one contributes very little to the reduction of the statistical 

error. 
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In these cases one is led to explore the possibility of performing re-weighted sampling, 
i.e., generating points according to a different distribution Pq{x), which is easier to sample 
efficiently, and then computing the average relative to p using a correction weight w{x) = 
p{x)/pq{x) involving the ratio of the probability densities: 

' (w) v^fi / (po) \ 

\'^/po 22i=iw[xl'^ 'j 

Many methods have been developed which are fundamentally based on the re-weighted 
average (jl.3p , which has found applications in physical chemistry [J-|o| , theoretical physics 
0) 3 1 economics and statistics @, 0| • Section [2] will be devoted to a thorough analysis of 
the statistical properties of re-weighted sampling, and we will discuss under quite general 
assumptions the conditions required to apply this technique successfully. 

Our analysis allows one to assess the efficacy of re- weighting in each of the contexts in 
which it is applied. One instance of an important problem which has recently been tackled 
by re-weighted sampling concerns the use of high-order path integration techniques in 
computer simulations which allow for the quantum nature of atomic motion. Conventional 
path integral (PI) methods I(]l4l2| involve a significant overhead compared to a purely 



classical treatment, and a considerable effort has been devoted to the quest for less 
computationally deman ding ap proaches. High order discretizations of the path have been 
available for some time 13l-[l5|. but it has only recently become apparent that one can 



circumvent the calculation of the bothersome second derivatives of the physical potential 
that arise in these discretizations by using re- weighted sampling (la . [l7l |. If successful, 
this approach would make high order path integration generally applicable and very 
attractive for many applications. 

To provide an example of the utility of the results established in Section 2, we shall 
therefore proceed in Section 3 to focus on the role of re-weighted sampling in high order 
PI simulations. We shall show in particular that this form of path integration is eventually 
bound to fail because of the reduced statistical efficiency introduced by the re- weighting. 
Simulations of small clusters and/or mildly quantum mechanical problems constitute a 
niche in which this approach might be beneficial (l7| , but its performance is bound to 
degrade for larger systems. 



2. Statistics of re-weighting 

A key issue which is often overlooked is that averaging according to (jl.3p will have a 
lower statistical efficiency than sampling p directly as in ()1.2p . i.e., for the same number 
of uncorrelated sample points, the statistical error in the re-weighted average will often 
be larger. Working out a simple, analytical expression for this drop in sampling efficiency 
would be extremely useful, as it would allow one to make an informed choice as to whether 
the computational gain of sampling pQ is overshadowed by the concomitant statistical 
inefficiency. 

In order to evaluate such an estimate, let us first simplify the notation of Eq. (|1.3p 
by labelling Oj the i-th. sample for the observable a, and Wi the ratio of the probability 
densities p{xi)/po{xi). Averages with respect to the target distribution p will be written 
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as (•) , while averages relative to pQ will be written as (•). With this notation, 

E, ■r-^ _ (aw) . . . . 

aiWi } Wi, and lim a„ = ^^ = (a) . (2.1 

■^ — ' n—^oo (w) ^ 



H 



Our objective is to unravel the statistical properties of the weighted average (|2.ip for a 
finite number n of sample points, and to compare them with those resulting from sampling 
the target distribution directly. We will assume that different samples are completely 
uncorrelated, since different sampling distributions and algorithms can be compared on 
the basis of the computational cost of generating a new uncorrelated sample point. 

It is shown in the appendix that an asymptotic expression for the expectation value 
and the variance of a^ can be obtained under very weak assumptions about the joint 
probability distribution 



Poia,w) = 
of a and w. To first order in n~^ 



6 {a — a(x)) 5 {w — w{x)) pQ{x)dx 



_ (aw) jaw) (w^) - jw) (aw^) 
{w) n (w) 

2 _ {a'w^) {wf - 2 (g^;) {aw^) jw) + {awf {w^) 

a [an) ^ ; -4 . [^■'J) 

n [w) 

Eq. (|2.2p shows that for any finite number of samples the re-weighted average is a biased 
estimator of (a) , i.e., that (a„) is affected by a systematic error which decreases with 

n~^, while Eq. (|2.3p shows that the statistical error in the re- weighted mean decreases 
asymptotically as n~^'^. 

In order to perform a comparison between the weighted and the un- weighted sampling 
strategies, it is necessary to evaluate the coefficients which enter Eqs. (|2.2p and (j2.3p . 
To do this we introduce an additional assumption, namely that a and h=—\nw are 
correlated Gaussian variates, with means (a) and 0, variances o"^(a) and a'^{h), and 
covariance (a/i), all with respect to pQ. We introduce h because in many cases - certainly 
the vast majority of those occurring in the physical sciences - the probability density 
is proportional to the exponential of an extensive state function (e.g. a difference 
Hamiltonian). It is not difficult to justify the assumption of Gaussian statistics, since 
the observable and the logarithm of the weight are often computed by summing a large 
number of fluctuating contributions, and will therefore have nearly Gaussian statistics by 
virtue of the central limit theorem. In any case the objective here is simply to discuss the 
qualitative features of re- weighted sampling, and the Gaussian limit provides a convenient 
framework in which to do this. 

Under the assumption of a Gaussian joint probability distribution for (a, /i), all 
expectation values of the form {a^w'^) can be computed analytically in terms of the 
parameters of the distribution. Eqs. ()2.2p and ()2.3p then simplify to 



{an) ~ (a)-(a/i) + (a/i)e'^'(''V^, (2-4) 

cj2(a„) « (a2(a) + (a/i)2)e"'('^V«- (2-5) 
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These equations convey in a more transparent way the message that is inherent in their 
more general counterparts. The systematic bias depends on the cross-correlation between 
the observable a and h = — Inw, and both the systematic and statistical error grow 
exponentially with a'^{h). Whenever the fluctuations of h are smaller than 1, there will be 
little difference between the efficiency of sampling directly based on the target distribution 
and that of the re-weighted method. However, as soon as (T{h) becomes larger than one, 
both the systematic bias and the statistical error of the re-weighted approach will require 
a much larger number of samples for convergence to within a given threshold. 



a- -(h) = l<ah>=l- 



a- -(h)=4,<ah>=l 




Figure 1. Numerical results for the mean and standard deviation in the mean for n independent samples 
obtained from re- weighting a Gaussian-distributed observable with zero mean and variance o"^(a) = 1, 
based on a difference Hamiltonian h with variance cr^{h), and cross-correlation (ah) with a. The full lines 
correspond to the predictions of Eqs. (|2.4p and (|2.5p . 



In Figure [T] we present the results of some numerical experiments in which we 
generated correlated Gaussian variables a and h, and computed the weighted average 
of a according to ()1.3p . The results confirm the asymptotic nature of our estimates, 
which describe the behaviour of {(in) and a'^{an) in the large-n regime. The small-n limit 
can instead be pin-pointed very easily by considering the n = 1 case, in which the average 
converges to the un- weighted value (a), and the variance is just a'^{a). 

While it is common knowledge that the fluctuations of the "difference Hamiltonian" 
h = — Inw must be small in order to apply re- weighting successfully, we believe it will 
come as a surprise to many that fluctuations of the order of a few units in h will decrease 
the sampling efficiency by orders of magnitude. The message here is that in re-weighted 
simulations the joint probability distribution of a and h should be monitored carefully. 
The value of exp[(7^(/i)] provides a direct indication of the statistical inefficiency due to 
re-weighting, and the cross-correlation {ah) provides an equally direct indication of the 
systematic bias in {a„). Both of these are straightforward to converge, as they involve 
un-weighted averages. 
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While Eqs. ()2.4p and ()2.5p cannot be used quantitatively in the case of non-Gaussian 
statistics, they provide useful guidelines to judge the quality of sampling in a re- weighted 
calculation. Whenever the fluctuations of h are large and the weighted average is much 
closer to (a) than to (a) — (ah), it is very likely that the numerical results obtained 
from (|1.3p will be meaningless. In order to provide an illustration of the utility of these 
estimates, we will now show that they can be used to predict the situations in which 
high-order PI schemes based on re- weighted sampling can be used successfully, and those 
in which they cannot. 



3. High order path integration 

(a) Suzuki- Chin path integration 

Path integral methods aim at approximating the quantum mechanical partition 
function, Z = TTe~^^ , by factoring it in such a way that the resulting expression can 
be evaluated within a classical framework. The simplest such factorization makes use 
of the well-known Trotter product formula (l8l |. and results in an expression which is 
equivalent to a purely classical partition function in an extended phase space consisting 
of many replicas (beads) of the system, connected by harmonic springs to form a closed 
path (necklace) (lOKl2l. [l9l | . The overall Trotter Hamiltonian reads 

^^"^=E|i + T(f)'('^.-'^.>i)^ + ^('?.)' (3-1) 

where {qj} and {pj} are the coordinates and momenta of the different beads, and V{q) is 
the physical potential. The classical partition function computed with this Hamiltonian 
at P times the physical temperature converges to the fully quantum mechanical partition 
function Z, with an error that is second-order in /3/P. The number of beads necessary for 
convergence is in general a small multiple of /3hujmax, where ujmax is the fastest physical 
vibration in the system. The computational cost therefore becomes very large for low 
temperatures and/or stiff vibrational modes, and considerable effort has been devoted to 
the quest for faster-converging factorizations in order to reduce this cost. 

It has been shown (20l | that factorizations with an error that is of an order higher than 
two in /3/P must contain composite operators which have the form of nested commutators 
between the kinetic energy operator T and the potential V. The complexity of the nested 
commutators increases with the order of the factorization and is only really manageable 
up to fourth-order. Among the different fourth-order factorizations developed so far [Ij- 
la,l2l||, the one proposed by Suzuki [Ij| and simplified by Chin [l5l | has better convergence 



properties than most others. It also only contains the double commutator [y, [T, V^]], 
which is proportional to the square modulus of the force and hence of a manageable 
complexity. 

This Suzuki-Chin (SC) factorization results in a Hamiltonian 



P ^2 /ot\2 



i=i ^ ^ 
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which contains a bead-dependent modified potential 



2 



V,ig„(3)=w,Viq,) + ^l^^^ \V'ig,)\\ (3.3) 

The coefficients Wj and dj take different values for odd and even beads, namely iU2j = 2/3, 
W2J+1 = 4/3, d2j = a/6, d2j+i = (1 — a)/12, where a is a free parameter which can be 
varied in the range [0, 1] . While tuning a can improve the convergence in specific cases 
[la.[l7l|. there is no universally optimal choice, so in the present work we used a = 0. 

The modified Hamiltonian resulting from the SC factorization contains the square 
modulus of the physical force. A path integral molecular dynamics (PIMD) calculation 
based on the corresponding classical equations of motion therefore requires one 
to evaluate the derivative of the squared force term, which contains the Hessian. 
Unfortunately, this is prohibitively expensive to evaluate for large systems with all 
but the simplest inter-atomic potentials. One way around this difficulty is to exploit 
the possibility of sampling based on a quasi- Trotter Hamiltonian which resembles 
Eq. (|3.ip but has the physical potential on each bead weighted by Wj (l6l |. The 
statistics corresponding to the SC Hamiltonian can then be recovered by re-weighting 
configurations with the weighting factor w = e~ = exp[—f3AH/P), where the difference 
Hamiltonian has the form 

The effect that re-weighting configurations based on (|3.4|) has on the statistical efficiency 
of sampling is the fundamental drawback of this approach, as will be discussed in the 
next subsection, based on the general results of Section [2] 

(6) The curse of system size 

The high-order path integral strategy we have just described may appear to be 
superior to Trotter PI in all respects (la |. However, one must not overlook the question 
of statistical efficiency, which could imply that longer trajectories are required to obtain 
the same level of sampling accuracy, a factor which must be considered when comparing 
the computational viability of different approaches. 

Given the analysis in Section [21 all we need to evaluate the errors due to finite 
sampling in this context is an estimate of the variance of AH. Consider first the 
case of a simple harmonic oscillator with frequency to, in the strongly-quantum regime, 
where the distribution of configurations is well approximated by the ground-state density 
P{q) (xexjp{—mujq^/h). Then, the squared fiuctuations of |^'(g)P read 

{\V\qt)-{\r{q)\')' = h'm'u:V2. 

Under the simplifying assumption that the statistics of q for a finite-P path integral 
simulation are the same as those of the quantum oscillator at full convergence, one can 
write down the squared fiuctuations of the temperature-scaled difference Hamiltonian 

h = (3AH/P as 

a^i^AH/P) w {P/2)''{phuj/Pf/162, (3.5) 

where k varies between one (if one assumes that there is no correlation between the forces 
on different beads) and two (if one assumes that the forces on all beads are the same) . 
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Eq. (|3.5p implies that it is possible to reduce the fluctuations of the weights w = e~ 
at will by increasing the number of replicas. However, one should remember that keeping 
this number low was the primary reason for attempting a high-order factorization in 
the first place. Since the relative error in the energy due to the discretization of the 
fourth order PI grows as (^huj/P)^, it is clear that - for a given discretization error - 
the sampling problems inherent in re-weighted SC PIMD are bound to get worse as the 
quantum nature of the problem becomes more pronounced. 

Moreover, the issue of statistical efficiency is exacerbated once one considers a system 
with / degrees of freedom. Since the difference Hamiltonian is size-extensive, its variance 
will scale linearly with /, leading to an exponential dependence of the statistical and 
systematic finite-sampling errors on the size of the system. For a multi-dimensional 
harmonic system with frequencies ojj one can estimate - in the worst-case scenario of 
complete correlation of the forces on different replicas - that the total squared fluctuations 
will be of the order of 

In practice, this means that the high-order path integral integration scheme which 
we have described in the previous section will yield a small statistical uncertainty up to 
a critical system size, above which the statistical efficiency will decrease exponentially. 
Our analysis of the harmonic limit yields an estimate of this critical size for a given value 
of /3hiJ. For instance, flexible water can be simulated successfully by Trotter PI with 
P > 32, and has u ss 2700 cm'^ For T = 300 K and P = 32, Eq. ([32]) suggests that the 
fluctuations in the difference Hamiltonian will be of the order of one in a simulation of a 
few tens of water molecules. This is the sort of system size up to which re-weighted SC 
PIMD is therefore predicted to be advantageous. 

(c) An example: liquid water 
In order to verify the predictions of Eq. (|3.6p and those of Section [2] for a realistic 



condensed-phase example, we have examined the case of a flexible water model J2J 
simulated at a temperature of 298 K and at the experimental density. We chose to perform 
simulations with 216 water molecules, so as to be in a regime in which - according to the 
arguments of Section 3b - the statistical inefficiency of re- weighting should be apparent. 
We explored numbers of beads ranging between P = 4 and P = 128, and for each set 
of parameters performed 16 independent simulations, each of which comprised 20 ps 
for the initial equilibration and 1 ns during which averages were accumulated. The 
calculations were accelerated using the ring-polymer contraction technique [2j|, with the 
same parameters as we have employed in previous simulations [22|. Canonical sampling 
was enforced by applying targeted Langevin thermostats to the internal modes of the 
ring polymer, and a global stochastic velocity rescaling with a time constant of 10 fs 
to the centroid (25|, l2a| • The quantum mechanical kinetic energy of the water molecules 
was accumulated using the modified form of the centroid virial estimator propose d by 



Yamamoto [l7|. This estimator, which is based on a scaling of fluctuation coordinates |27| . 
can be evaluated by finite differences thereby avoiding the need for high-order derivatives 
of the potential. 

As an initial test, we examined the convergence of the average kinetic and potential 
energies of the liquid as a function of the number of replicas, for both Trotter PI and 
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Figure 2. The two panels show the convergence of the average potential and kinetic energy for a simulation 
of 216 water molecules as a function of the number of beads P. The reference value - shown as a black 
dashed line - corresponds to Trotter PI with P — 128. The results from the PI-GLE accelerated Trotter 
PIMD are also reported [221 • Error bars were calculated, but are smaller than the data points on this 
scale. For the SC simulations we also report the averages of the observables computed without weights 
(joined by dashed lines) and those obtained from the infinite n limit of Eq. (|2.4|l (continuous lines). 



re-weighted SC PI (see Figure El) . As expected, and as witnessed in previous calculations 
with a similar water model jl6|, the higher order of the PI factorization does translate 
into faster convergence. However, for the simulations with P < 32, the statistical error in 
the mean for the re-weighted SC method was found to be considerably larger than that 
of Trotter PIMD. Even more worrying, for re- weighted SC PIMD there is a significant 
difference between the weighted average of the kinetic energy and the asymptotic result 
predicted for Gaussian statistics [the n— )-cxd limit of Eq. (|2.4p ]. These issues will be 
considered in more detail below. As a final remark, it can be seen from Figure [2] that 
the recently-developed stochastic PI+GLE |22 | approach manages to obtain even faster 
convergence than SC PIMD, without any of the sampling issues that we are about to 
elucidate. 

It remains to be verified whether the scenario presented in Section 3b based on 
a harmonic model and an analysis of the sampling efficiency in the Gaussian limit 
corresponds to the numerical finding in this real example. First of all, one needs to verify 
whether the scaling of the variance of the logarithm of the weights follows Eq. ()3.6p . 
Figure [3] demonstrates that both the scaling with P and with the number of degrees 
of freedom correspond to the predictions we have made, even if the quantitative values 
for (j^(/3Ai7/P) are lower by about a factor of 3 than the analytical estimates obtained 
from the harmonic model. Taking this correction into account increases to about 100 
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Figure 3. The three panels show the fluctuations of the temperature-scaled difference Hamiltonian h — 
J3/S.H/P for a simulation of a flexible water model at T = 298 K, for different numbers of replicas P and 
numbers of water molecules Nw- The top left panel shows the data points as a function of Nw', the 
dashed lines are guides for the eye corresponding to a^ oc Nw behaviour. The bottom left panel shows 
the points as a function of P; here the dashed lines correspond to a^ ocP"'*. The right panel shows a 
contour plot of the interpolant of the data points as a function of P and A^'vi^, with a^ denoted by the 
colour scale. Red indicates regions in which the statistical inefficiency of re- weighted Suzuki-Chin PIMD 
is unmanageable, while blue indicates regions in which it is comparable to that of Trotter PIMD. Note 
that for a given P, the inefficiency gets worse with increasing system size. 



the number of water molecules for which re-weighted SC PIMD is computationally 
reasonable. 

Having verified that the fluctuations in the difference Hamiltonian closely obey 
Eq. (|3.6|) . let us discuss the validity of the relations between these fluctuations and 
the error in the mean that we have worked out in Section [2l Figure U] shows that - for 
the case of P = 16, where the variance of the temperature-scaled difference Hamiltonian 
h = (3^H/P is much larger than one - the trends do indeed correspond to those derived 
in the limit of Gaussian statistics. Averages computed from short trajectories suffer a 
large systematic bias, and they slowly approach the limit predicted from the cross- 
correlation between the observable and the difference Hamiltonian as the number of 
samples is increased. To obtain a converged expectation value and therefore verify the 
accuracy of the prediction based on Gaussian statistics, a prohibitively long trajectory 
would be required. Also as anticipated, the upper panel of Figure 2] clearly demonstrates 
that the statistical variance in the mean for re-weighted SC PIMD decreases much more 
slowly than for Trotter PIMD, completely offsetting the benefits of using the higher-order 
propagator. 

It is important to remark that we have been able to make these issues stand out 
clearly because we have considered a relatively simple problem, for which we could run 
long trajectories and gather extensive statistics. In real applications one can rarely afford 
to perform such a detailed error analysis, and in this respect the analytical results we 
have derived provide a useful litmus test. The fluctuations of the temperature-scaled 
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Figure 4. The two panels show the convergence of the mean kinetic energy for a simulation of 216 water 
molecules, at T = 298 K and for P = 16, as a function of the length of the simulation, together with the 
relative fluctuations in the mean. In the lower plot, the un- weighted average kinetic energy and that 
obtained from the infinite n limit of Eq. (|2.4p are also shown. 

difference Hamiltonian h should be smaller than one, since for larger fluctuations the 
statistical inefficiency rapidly becomes unmanageable. Even more interesting is the cross- 
correlation between the observable a and h, which indicates the extent of the systematic 
bias. 



4. Conclusions 



In this paper we have performed a careful assessment of the statistical efficiency of re- 
weighted sampling. Our analysis shows a dramatic degradation in performance when 
the squared fluctuations of the logarithm of the weight exceed one. While it is common 
knowledge that large fluctuations in the weights lead to failure of such techniques, and 
that the size-extensivity of the difference Hamiltonian makes these methods unsuitable 
for the study of large-scale problems, our analysis shows clearly how dramatic this effect 
can be, and also that the problem is quite insidious. 

In fact, insufficient sampling in a re- weighted calculation is very difficult to detect. 
It mainly manifests itself through the error in the mean decreasing more slowly than 
the inverse square root of the number of samples even after a very large number of 
samples have been accumulated, indicating that the asymptotic regime has not yet been 
reached (see the upper panel of Fig. 4). An even greater concern is that finite sampling 
implies a bias in the estimate of the observable, which decreases very slowly and would be 
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exceedingly hard to detect in a more complex - or computationally demanding - problem 
(see the lower panel of Fig. 4). 

As a demonstration of how these results can be used, we have examined the 
applicability of high order path integral factorizations to condensed-phase simulations. 
The most promi sing of these techniques relies on re-weighting based on a difference 
Hamiltonian (la . \17^ . We have worked out the dependence of the fluctuations in this 
quantity on the properties of the system being studied, the number of degrees of freedom 
involved, and the number of path integral beads. We have shown that for large and/or 
strongly quantum systems, a large number of replicas is required, not so much in order 
to converge the asymptotic expectation values of quantum observables, but to keep the 
fluctuations in the re-weighted average under control. Our analytical results allow one 
to predict the break-even point at which the advantages of high-order path integrals 
are lost due to statistical inefficiency. These predictions have been qualitatively verified 
by numerical experiments on the simulation of a flexible model of water under ambient 
conditions. For this system, high-order SC PIMD is advantageous for simulations of up 
to around 100 water molecules, but no more. 

While a niche for the use of Hessian-free high-order PIMD therefore exists for the 
study of small clusters, one must consider that accelerated PIMD based on stochastic 
dynamics |22l | exhibits comparable or faster convergence of quantum observables, without 
being affected by any of the statistical issues we have focussed on here. We firmly believe 
that this stochastic acceleration is therefore a far more promising approach to large scale 
path integral simulations. 
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An asymptotic expression for re-weighted averages 

In this appendix we present a derivation of the asymptotic expansion of the expectation 
value of a re-weighted average of n terms from a general distribution. As usual, we say 
that the formal sum YI^q Cjn~^ is an asymptotic expansion for /„ if for any fixed k 



converges. Our main result is as follows: 



we have fn = '^j=QCjn ^ + 0{n (^■+^)) as n— )-cxd. This does not imply that the sum 



Theorem 1. Let {ai,Wi)'^^^ be n independent samples from a (correlated) joint 
distribution pQ{a,w). Suppose that w takes only positive values, and that all moments 
of a, w and w~^ are finite. Let 

n n 

an = '^aiWi/'^Wi. (A.l) 

4 = 1 4 = 1 

Then {(in) and cr^(a„) = (a^) — {(in) have asymptotic expansions in n whose coefficients 
are determined by the joint moments of po{a,w), with the leading terms in these 
expansions being given by Eqs. {\2.2\i and (|2.3|) . 
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The argument below will show how to calculate the higher-order coefficients of the 
expansion, although there does not seem to be a simple explicit formula for the general 
coefficient Cj. 

Multiplying the distribution u; by a constant (i.e., multiplying all Wi by the same 
constant) does not affect {(in), so we assume in the proof that {w) = 1. With this 
assumption, Eqs. (|2.2p and ()2.3p simplify to 

(a„) = {aw) + n~'^ {{aw){w'^) -{aw"^)) +0{n-'^), (A.2) 

a2(a„) = n^U{aW)-2{aw){aw'^) + {awf{w^))+0{n~^). (A.3) 

Explicit calculations can be further simplified by subtracting a constant from a (and 
hence from {(in)) to ensure that {aw) =0, but we shall not do this in the following. 
Since all the samples are independent and identically distributed, one can write 

where z„ = Y17=i '^i ^^ ^^^ ^^^^ '^^ '^ independent copies of w, and Zn-i (but not Zn) is 
independent of the correlated pair (a,w). 

Let w = w — 1 and z„ = z^ — n = Y17=i ^^ ^^ ^^^ centralized versions of w and z„. We 
first consider the properties of Zn- Even in the (canonical) special case where w is log- 
normal, not much can be obtained analytically about the distribution of Zn |28l |. However, 
one can obtain its moments using the general formula 

«W=^W_^(^^-lV«;,(fc-) (A.5) 

relating the cumulants k^^' and moments fj,^^' of a distribution. When fi^^' = (and hence 
K^^' = 0)), as is the case for the centred variables w and z„, Eq. (jA.Sp can be written as 

'-' ^k - 1 



^vv=^(A:)_^r-M^{i)^(fc-i). (A.6) 



We therefore first calculate the cumulants k\ of the distribution of w using Eq. ()A.6p . 

The cumulants of Zn are simply Rn = nK\ , and the moments jln of Zn can be recovered 

by inverting Eq. ()A.6P . By induction on k we find that /in is a polynomial in n of degree 
\_k/2\ ; this can also be seen directly by expanding the expectation of the kth power of 
the sum of the Wi. 

An asymptotic expansion for (a„) can now be developed by expanding the 
denominator in Eq. (|A.4p about its mean. Let X = Zn/n = Zn/n — 1. Using the identity 

1/(1 + x) = E'^jZoi-^)^ + (-^)V(1 + x), we have 

,_ V ,'anWn\ I anWn 
{an) 



Zn/n I \ 1 -I- X 



k 1 / ^r]i 



Y.{-\y {anWnX^) + (-1)" LnWn^^) ■ (A.7) 
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Since X = (zn-i + Wn)/n and Zn-i is independent of (a„, Wn), the terms in the sum can 
be written out in terms of the moments fj-l^li of Zn-i'- 

{-ly {anWnX^) = {-nr JZ i^) («(^ + 1)*') f^-?- (A.8) 

For each j, the term described by Eq. (jA.SP is a (Laurent) polynomial in n with powers 
between n"'-''^' and n~K We claim that 

4') = (a^^^-^^^ = 0(n-^-/2)^ (A.9) 

so the residual term i?n shrinks with n. Moreover, summing Eq. (jA.SP over j < 2(i and 
collecting coefficients gives the first terms of the asymptotic expansion of (a„) down to 
order n~ . For example, setting d=l and discarding the terms with an n~^ dependence 
yields Eq. (jA^ . 

The bound Eq. (|A.9P is perhaps most easily seen by applying the Cauchy-Schwartz 
inequality twice, or Holder's inequality once, to show that in general |(AiA2j43j44)| < 

J][j^-^ {^j4^) . Here Ai=an and A2 = Wn contribute constant factors to the product. 

With A3 = X^ = {zn/n)^ we have {Aj) = n~^^fl'n''^ = 0{n-^''), so {A^)^^^ has the 
required order 0{n~^'^). Finally, A| = (1 + X)~^ = (z„/n)~^. Since the function rc~^ 
is convex, and Zn/n is just the average of the Wi, we have Af<Y^^^^w~ /n, so 
(Af) < {w~'^\ which is by assumption a finite constant, and the bound Eq. ()A.9p follows. 
For the variance, the argument is very similar, now using the fact that 

/_2\ / .. / an-lWn-ianWn\ , / (^W^ 

(a„) =n[n- 1 ( ^-- —-, — ) +n' 



n2(l+A:)2 / \n2(l + X)2 

and the identity (1 + x)'"^ = E^=d(j + '^){-xV + (-l)''^^^^^(rSp^~~- ^ similar method 
applies to higher moments of a„. 

Note that the assumption that all negative moments of w are finite was only used to 
control the tail behaviour of 1/(1 + X) = {zn/n)~^ near 0. In fact, this has much better 
tail behaviour than w~^, so much weaker conditions suffice. For example, splitting the 

sum Zn into groups of m and noting that z^ < Jli^i ''^j ' °^^ "^^^ check that the 
result holds assuming only that {w~°^) is finite for some positive a. 

Finally, we should emphasize that one cannot expect the infinite sum corresponding 
to Eq. (A. 7), or the asymptotic expansion of (a„), to converge for any fixed n. Indeed, 

in the log-normal case where w = e~ , we have fi\ = (^w 'j =ex.p{a'^{h)k'^ /2), which 
becomes exp[a'^[h)k{k — l)/2) after normalizing so that {w) = l. Moreover, the /cth 

central moment jl\ also contains a component proportional to ^\ (i^ is a linear 
combination of this and lower moments). Hence, Eq. (jA.SP will contain a component 

proportional to [exp ((T^(/i)/2)] /n^~^. For any n, this eventually increases with j, 

so the infinite sum does not converge. Similar considerations strongly suggest that the 
first few terms in the asymptotic expansion will be a good approximation only if (j'^{h) 
is small compared to (at most a certain constant times) log n. 
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